# Efficient training

Gemma 2 9b It WPO HB
A large language model fine-tuned from the gemma-2-9b-it model using the Weighted Preference Optimization (WPO) method, enhancing the effectiveness of off-policy preference optimization.
Large Language Model Transformers
G
wzhouad
15
36
Llmc Gpt2 774M 150B
MIT
This is a 774M-parameter language model based on the GPT-2 architecture, trained on 150 billion tokens from the FineWeb dataset.
Large Language Model Transformers English
L
mdouglas
18
1
Rwkv Raven 1b5
RWKV is a large language model that combines the advantages of RNN and Transformer, supporting efficient training and fast inference with unlimited context length processing capability.
Large Language Model Transformers
R
RWKV
428
12
Gerbil A 32m
Apache-2.0
Gerbil-A-32m is an A-grade model with 32 million parameters, trained on 640 million tokens, suitable for various natural language processing tasks.
Large Language Model Transformers
G
GerbilLab
33
2
Deta Swin Large
DETA is a transformer-based object detection model that achieves rapid convergence and efficient detection by reintroducing the IoU assignment mechanism and NMS methods.
Object Detection Transformers
D
jozhang97
2,741
15
Roberta Base Wechsel German
MIT
A German RoBERTa model trained using the WECHSEL method, achieving cross-lingual transfer of monolingual language models through effective initialization of subword embeddings.
Large Language Model Transformers German
R
benjamin
96
7
Gpt2 Wechsel French
MIT
A French version of GPT-2 trained using the WECHSEL method, achieving cross-lingual transfer of monolingual language models through effective initialization of subword embeddings.
Large Language Model Transformers French
G
benjamin
33
4
Gpt2 Wechsel Chinese
MIT
A Chinese GPT-2 model trained using the WECHSEL method, achieving cross-lingual transfer of monolingual language models through effective initialization of subword embeddings.
Large Language Model Transformers Chinese
G
benjamin
19
4
Bert Tiny Finetuned Stsb
This model is based on the BERT-tiny architecture and fine-tuned on the STS-B dataset using the M-FAC second-order optimizer for text similarity calculation.
Large Language Model Transformers
B
M-FAC
17
1
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase